Sensitive pattern discovery with 'fuzzy' alignments of distantly related proteins

نویسندگان

Andreas Heger

Liisa Holm

چکیده

MOTIVATION Evolutionary comparison leads to efficient functional characterisation of hypothetical proteins. Here, our goal is to map specific sequence patterns to putative functional classes. The evolutionary signal stands out most clearly in a maximally diverse set of homologues. This diversity, however, leads to a number of technical difficulties. The targeted patterns-as gleaned from structure comparisons-are too sparse for statistically significant signals of sequence similarity and accurate multiple sequence alignment. RESULTS We address this problem by a fuzzy alignment model, which probabilistically assigns residues to structurally equivalent positions (attributes) of the proteins. We then apply multivariate analysis to the 'attributes x proteins' matrix. The dimensionality of the space is reduced using non-negative matrix factorization. The method is general, fully automatic and works without assumptions about pattern density, minimum support, explicit multiple alignments, phylogenetic trees, etc. We demonstrate the discovery of biologically meaningful patterns in an extremely diverse superfamily related to urease.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PASS2 version 4: An update to the database of structure-based sequence alignments of structural domain superfamilies

Accurate structure-based sequence alignments of distantly related proteins are crucial in gaining insight about protein domains that belong to a superfamily. The PASS2 database provides alignments of proteins related at the superfamily level and are characterized by low sequence identity. We thus report an automated, updated version of the superfamily alignment database known as PASS2.4, consis...

متن کامل

The Context-Dependence of Amino Acid Properties

One of the current limitations of using sequence alignments to identify proteins with similar structures is that some proteins with similar structures do not have significant sequence similarity by identity. One way to address this "hidden-homology" problem is to match amino acids based on their chemical and physical properties. However, the amino acid properties overlap, creating orthogonal di...

متن کامل

Combining sequence and structure information in protein alignments

For distantly related proteins, alignmentsbased on structural information are more reliable than traditional sequence alignments. However, when structural comparison leaves some ambiguity in alignment, sequence information can provide valuable additional information to discriminate between multiple alternatives. In this paper we present a Bayesianmodel that incorporates sequence information int...

متن کامل

Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences.

Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods...

متن کامل

A Space-Efficient Approach towards Distantly Homologous Protein Similarity Searches

Protein similarity searches are a routine job for molecular biologists where a query sequence of amino acids needs to be compared and ranked against an ever-growing database of proteins. All available algorithms in this field can be grouped into two categories – either solving the problem using sequence alignment through dynamic programming, or, employing certain heuristic measures to perform a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Bioinformatics

دوره 19 Suppl 1 شماره

صفحات -

تاریخ انتشار 2003

Sensitive pattern discovery with 'fuzzy' alignments of distantly related proteins

نویسندگان

چکیده

منابع مشابه

PASS2 version 4: An update to the database of structure-based sequence alignments of structural domain superfamilies

The Context-Dependence of Amino Acid Properties

Combining sequence and structure information in protein alignments

Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences.

A Space-Efficient Approach towards Distantly Homologous Protein Similarity Searches

عنوان ژورنال:

اشتراک گذاری